Reading recommendation

Since I am very methodical this notebook is well structured, very detailed and quite long. I warmly recommend reading this notebook with an interactive table of contents side-panel for convenient navigation. Jupyter with jupyter_contrib_nbextensions supports these and many more useful perks (I warmly recommend!), but they are unsupported in Git or a normal Jupyter, so -

Ways to view this notebook with an interactive navigation side-panel:

  • Easiest and fastest: view this notebook in html. Git does not support viewing it online, so you can download it to your computer and open.
  • Download this notebook and view on Google Colab.
  • Best for you in long term: view this notebook in Jupyter notebook with jupyter_contrib_nbextensions installed (learn how here).

Intro

Author: oz.livneh@gmail.com

  • All rights of this project and my code are reserved to me, Oz Livneh.
  • Feel free to use - for personal use!
  • Use at your own risk 😉

Project description

DatingAI is a personal research project that I started for my interest, challenge and experience, aiming to:

  1. Learn personal taste in dating sites:
    1. How to predict personal taste (represented by the scores that a user gives to profiles) by neural networks.
    2. How different profile features - images, numerical/categorical features (age, location...), text - affect the personal taste of a user.
    3. How to represent personal taste and profile embedding, what clusters exist and how they depend on the scoring scale...
  2. Develop a personal dating agent that automatically likes/passes and messages profiles according to the user personal taste learned.

Notebook description

The architecture presented in this notebook predicts for each image the user score value given to the dating profile to which the image belongs. This assumes that all images in each profile are independent and given the same score (of their profile), which is obviously a simplification.


Why regression suits better than classification: Instead of predicting the image score class, this architecture predicts the image score value - a regression task, to take into consideration the distance between scores in the loss, e.g. for a target score of +3, predicting -1 (err=4) should be much worse than predicting +2 (err=1), but in classification the penalty on both mistakes is the same.

Supported net architectures

  • net_architecture options:
    • 'my simple CNN' - a (very) simple CNN I wrote mainly for debugging.
    • Pretrained models: 'inception v3','resnet18'.
  • freeze_pretrained_net_weights options:
    • True: weights of pretrained models are left untrained except for the last layer.
    • False: weights of pretrained models are trained entirely starting from pretrained weights.

In this demo I use Inception v3 with freeze_pretrained_net_weights=False.

Data structure

Each row in profiles_df, the output of script_Personal_cupid_scraper.py, represents a profile that has in its 'image filenames' column a nested list of image filenames, 0 or more. Therefore the dataset used in here is based on unnested_images_df, created by unnesting the image filenames from profiles_df such as each row in unnested_images_df represents an image, the the columns for each row are taken from the columns of the profile to which the image belonged. (see # Processing data, unnesting images).

Initialization

Key system requirements

In [1]:
import torch
print('torch\t\ttested on 1.1\t\tcurrent:',torch.__version__)

import sys
print('Python\t\ttested on 3.6.5\t\tcurrent:',sys.version[:sys.version.find(' ')])
import numpy as np
print('numpy\t\ttested on 1.16.3\tcurrent:',np.__version__)
import pandas as pd
print('pandas\t\ttested on 0.24.0\tcurrent:',pd.__version__)

import matplotlib
print('matplotlib\ttested on 3.0.3\t\tcurrent:',matplotlib.__version__)
torch		tested on 1.1		current: 1.1.0
Python		tested on 3.6.5		current: 3.6.5
numpy		tested on 1.16.3	current: 1.16.3
pandas		tested on 0.24.0	current: 0.24.0
matplotlib	tested on 3.0.3		current: 3.0.3

Setting main parameters

In [2]:
#--------- general -------------
debugging=True # executes the debugging sections, prints their results
# debugging=False
torch_manual_seed=0 # integer or None for no seed; for torch reproducibility, as much as possible
#torch_manual_seed=None

#--------- data -------------
data_folder_path=r'D:\My Documents\Dropbox\Python\DatingAI\Data'
images_folder_path=r'D:\My Documents\Dropbox\Python\DatingAI\Data\Images'
df_pickle_file_name='profiles_df.pickle'

"""
dataset downsampling (samples = images):
    if max_dataset_length>0: builds a dataset by sampling only 
        max_dataset_length samples from all available data. 
        requires user approval!
    if max_dataset_length<=0: not restricting dataset length - using all 
        available data
"""
max_dataset_length=1000
# max_dataset_length=0
seed_for_dataset_downsampling=0 # integer or None for no seed; for sampling max_dataset_length samples from dataset

"""
random_transforms - data augmentation by applying random transforms
        (random crop, horizontal flip, color jitter etc.) defined at 
        # Building a PyTorch dataset of images with transforms
"""
# random_transforms='train & val' # data augmentation on both train and val phases
random_transforms='train' # data augmentation only on train phase, validation is free of random transforms 
# random_transforms='none' # no data augmentation

load_all_images_to_RAM=False # default; loads images from hard drive for each sample in the batch by the PyTorch efficient (multi-processing) dataloader
# load_all_images_to_RAM=True # loads all dataset images to RAM; estimates dataset size and requires user approval

validation_ratio=0.5 # validation dataset ratio from total dataset length

batch_size_int_or_ratio_float=8 # if int: each batch will contain this number of samples
#batch_size_int_or_ratio_float=1e-2 # if float: batch_size=round(batch_size_over_dataset_length*dataset_length)
data_workers=0 # 0 means no multiprocessing in dataloaders
#data_workers='cpu cores' # sets data_workers=multiprocessing.cpu_count()

shuffle_dataset_indices_for_split=True # dataset indices for dataloaders are shuffled before splitting to train and validation indices
#shuffle_dataset_indices_for_split=False
dataset_shuffle_random_seed=0 # numpy seed for sampling the indices for the dataset, before splitting to train and val dataloaders
#dataset_shuffle_random_seed=None
dataloader_shuffle=True # samples are shuffled inside each dataloader, on each epoch
#dataloader_shuffle=False

#--------- net -------------
# architecture_is_a_pretrained_model=False
# net_architecture='my simple CNN'

architecture_is_a_pretrained_model=True
net_architecture='inception v3'
#net_architecture='resnet18'

# freeze_pretrained_net_weights=True # freezes pretrained model weights except the last layer
freeze_pretrained_net_weights=False # trains pretrained models entirely, all weights, starting from pretrained values

loss_name='MSE'

#--------- training -------------
train_model_else_load_weights=True
#train_model_else_load_weights=False # instead of training, loads a pre-trained model and uses it

force_train_evaluation_after_each_epoch=True # adding evaluation of the training dataset after each epoch finishes training
# force_train_evaluation_after_each_epoch=False # default

epochs=15
learning_rate=2e-4

optimizer_name='SGD'
SGD_momentum=0.7 # default: 0.9

# optimizer_name='Adam'
Adam_betas=(0.7,0.999) # default: (0.9,0.999)

lr_scheduler_decay_factor=0.9 # applies to all optimizers; on each lr_scheduler_step_size epochs, learning_rate*=lr_scheduler_decay_factor
lr_scheduler_step_size=1

best_model_criterion='min val epoch MSE' # criterion for choosing best net weights during training as the final weights
return_to_best_weights_in_the_end=True # when training complets, loads weights of the best net, definied by best_model_criterion
#return_to_best_weights_in_the_end=False

period_in_seconds_to_log_loss=30 # <=0 means no logging during training, else: inter-epoch logging and reporting loss and metrics during training
#plot_realtime_stats_on_logging=True # incomplete implementation!
plot_realtime_stats_on_logging=False
#plot_realtime_stats_after_each_epoch=True
plot_realtime_stats_after_each_epoch=False
#plot_loss_in_log_scale=True
plot_loss_in_log_scale=False

#offer_mode_saving=True # offer model weights saving ui after training (only if train_model_else_load_weights=True)
offer_mode_saving=False
models_folder_path='D:\My Documents\Dropbox\Python\DatingAI\Data\Saved Models'

Imports

I could hide all my function and class definitions in another script, but I want this notebook to be clear and self-contained. Also, Github now supports jumping to definitions!

In [3]:
import logging
logging.basicConfig(format='%(asctime)s %(funcName)s (%(levelname)s): %(message)s',
                   datefmt='%Y-%m-%d %H:%M:%S')
logger=logging.getLogger('main logger')
logger.setLevel(logging.INFO)

import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import PIL
import random
from time import time
import copy
import multiprocessing
if data_workers=='cpu cores':
    data_workers=multiprocessing.cpu_count()

import torch
torch.manual_seed(torch_manual_seed)

import torchvision
import torch.nn as nn
import torch.nn.functional as F

import DatingAI
DatingAI.torch.manual_seed(torch_manual_seed)
device=DatingAI.device
2019-07-11 21:56:19 <module> (INFO): torch is using cuda:0 (GeForce GTX 960M)
2019-07-11 21:56:19 <module> (INFO): DatingAI imported
In [4]:
# a charm for interactive plotting in Jupyter notebook (useful for zooming, rotating 3D plots):
%matplotlib notebook

Pre-processing data, analysis

Reading data, unnesting images

Data structure is demonstrated in # 1.4 Data structure.

In [5]:
profiles_df_path=os.path.join(data_folder_path,df_pickle_file_name)
profiles_df=pd.read_pickle(profiles_df_path)

# finding score column (column name in the format of 'score (levels=%d)'%score_levels)
score_column_name=None
for column in profiles_df.columns:
    if 'score' in column:
        score_column_name=column
        break
if score_column_name==None:
    raise RuntimeError("no existing column name in profiles_df contains 'score'!")
    
# unnesting images
unnested_images_dict={}
for profile_index in range(len(profiles_df)):
    row_series=profiles_df.iloc[profile_index,:]
    profile_id=row_series['profile id']
    profile_score=row_series[score_column_name]
    for filename in row_series['image filenames']:
        if filename=='pq_400.pn': # skipping this blank profile image (in a strange format)
            continue
        if os.path.isfile(os.path.join(images_folder_path,filename)):
            unnested_images_dict.update({len(unnested_images_dict):{
                'profile index':profile_index,
                'profile id':profile_id,
                'score':profile_score,
                'image filename':filename}})
        else:
            logging.warning(f'profile {profile_index}: {filename} not found -> skipping image!')
unnested_images_df=pd.DataFrame.from_dict(unnested_images_dict,orient='index')

if max_dataset_length>0 and max_dataset_length<len(unnested_images_df):
    user_data_approval=input('ATTENTION: downsampling is chosen - building a dataset by sampling only max_dataset_length=%d samples from all available data! approve? y/[n] '%(round(max_dataset_length)))
    if user_data_approval!='y':
        raise RuntimeError('user did not approve dataset max_dataset_length sampling!')
    random.seed(seed_for_dataset_downsampling)
    sampled_indices=random.sample(range(len(unnested_images_df)),max_dataset_length)
    unnested_images_df=unnested_images_df.iloc[sampled_indices]

logger.info('comleted unnesting images from profiles_df to unnested_images_df of length %d'%(len(unnested_images_df)))
ATTENTION: downsampling is chosen - building a dataset by sampling only max_dataset_length=1000 samples from all available data! approve? y/[n] y
2019-07-11 21:56:27 <module> (INFO): comleted unnesting images from profiles_df to unnested_images_df of length 1000

Debugging: checking image shapes

In [6]:
image_num_to_sample=5
# end of inputs ---------------------------------------------------------------
if debugging:
    logger.info('checking image shapes of %d sampled images'%image_num_to_sample)
    sampled_indices_list=random.sample(range(len(unnested_images_df)),image_num_to_sample)
    for i in sampled_indices_list:
        df_row=unnested_images_df.iloc[i,:]
        image_filename=df_row['image filename']
        image_array=plt.imread(os.path.join(images_folder_path,image_filename))
        print(f'{image_filename} shape:',image_array.shape)
2019-07-11 21:56:27 <module> (INFO): checking image shapes of 5 sampled images
18298895809454886132.jpeg shape: (400, 400, 3)
15594396992572230250.jpeg shape: (400, 400, 3)
10478668768338634515.jpeg shape: (400, 400, 3)
14226781409400696191.jpeg shape: (400, 400, 3)
13567986817232578874.jpeg shape: (400, 400, 3)

Building a PyTorch dataset of images with transforms

Dataset building

In this section the transforms are defined (not all are random, some are required independently of data augmentation), then a dataset is built, tested, and only later training and validation datasets and dataloaders are built.

If load_all_images_to_RAM=True, the size of the dataset is estimated and then the user can choose to load all images to the RAM.

Data augmentation

  • Random transforms used below: random crop, random horizontal flip, color jitter.
  • random_transforms set in # Setting main parameteres controls data augmentation:
    • random_transforms='train & val': data augmentation on both train and val phases. The dataset that is created here is later split to training and validation datasets (and dataloaders).
    • random_transforms='train': data augmentation only on train phase, validation is free of random transforms. The dataset that is created here with transforms, and later new separate train/val datasets (and dataloaders) are created, with/without random transforms.
    • random_transforms='train': no data augmentation. The dataset that is created here is later split to training and validation datasets (and dataloaders).
In [7]:
n_to_sample_for_data_size_estimation=10 # only if load_all_images_to_RAM=True was set
"""torchvision.transforms.ToTensor() Converts a PIL Image or numpy.ndarray (H x W x C) in the 
    range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the 
    range [0.0, 1.0] if the PIL Image belongs to one of the modes 
    (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has 
    dtype = np.uint8
In the other cases, tensors are returned without scaling
source: https://pytorch.org/docs/stable/torchvision/transforms.html
"""
if random_transforms=='none':
    random_transforms_ui=input('random_transforms=False was set, meaning no data augmentation, approve? [y]/n ')
    if random_transforms_ui=='n':
        raise RuntimeError('user did not approve no data augmentation, aborting')

if architecture_is_a_pretrained_model:
    if net_architecture=='inception v3':
        input_size_for_pretrained=299
    else:
        input_size_for_pretrained=224
    
    transform_func_with_random=torchvision.transforms.Compose([
            torchvision.transforms.Resize(input_size_for_pretrained+10),
            torchvision.transforms.RandomCrop(input_size_for_pretrained),
            torchvision.transforms.ColorJitter(brightness=0.1,contrast=0.1,saturation=0,hue=0),
            torchvision.transforms.RandomHorizontalFlip(p=0.5),
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225]), # required for pre-trained torchvision models!
            ])
    transform_func_no_random=torchvision.transforms.Compose([
            torchvision.transforms.Resize(input_size_for_pretrained),
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225]), # required for pre-trained torchvision models!
            ])
else:
    transform_func_with_random=torchvision.transforms.Compose([
#            torchvision.transforms.Resize(400),
            torchvision.transforms.RandomCrop(390),
            torchvision.transforms.ColorJitter(brightness=0.1,contrast=0.1,saturation=0,hue=0),
            torchvision.transforms.RandomHorizontalFlip(p=0.5),
            torchvision.transforms.ToTensor(),
            ])
    transform_func_no_random=torchvision.transforms.Compose([
            torchvision.transforms.Resize(390),
            torchvision.transforms.ToTensor(),
            ])
# end of inputs ---------------------------------------------------------------

if random_transforms=='none':
    transform_func=transform_func_no_random
else:
    transform_func=transform_func_with_random
dataset=DatingAI.unnested_images_dataset(unnested_images_df=unnested_images_df,
    images_folder_path=images_folder_path,transform_func=transform_func)

if load_all_images_to_RAM:
    # estimating dataset size based on sampled samples(images)
    sampled_sample_indices=random.sample(range(len(dataset)),n_to_sample_for_data_size_estimation)
    sampled_images_dict_in_RAM=DatingAI.build_images_dict_in_RAM(
        image_filenames_list=list(unnested_images_df['image filename'].iloc[sampled_sample_indices]),
        images_folder_path=images_folder_path)
    image_np_arrays_size_MB=sum([sys.getsizeof(np.array(image)) for image in sampled_images_dict_in_RAM.values()])/1e6
    expected_sampled_images_dict_in_RAM_size_MB=image_np_arrays_size_MB/n_to_sample_for_data_size_estimation*len(dataset)
    user_decision_RAM=input('load_all_images_to_RAM=True was set, estimated dataset size based on %d random samples: %.1eMB, load all images to RAM? y/[n] '%(
            n_to_sample_for_data_size_estimation,expected_sampled_images_dict_in_RAM_size_MB))
    if user_decision_RAM=='y':
        logger.info('started loading all images to RAM')
        images_dict_in_RAM=DatingAI.build_images_dict_in_RAM(
            image_filenames_list=list(unnested_images_df['image filename']),
            images_folder_path=images_folder_path)
        dataset=DatingAI.unnested_images_dataset(
            unnested_images_df=unnested_images_df,
            images_dict_for_RAM_loading=images_dict_in_RAM,
            transform_func=transform_func)
        image_np_arrays_size_MB=sum([sys.getsizeof(np.array(image)) for image in images_dict_in_RAM.values()])/1e6
        logger.info('completed loading all images to RAM, size: %.1eMB'%image_np_arrays_size_MB)
    else:
        logger.info('user disapproved loading all dataset to RAM, keeping it on the hard drive and loading with a dataloader')

sample_size=dataset[0]['image'].size()
sample_pixels_per_channel=sample_size[1]*sample_size[2]
sample_pixels_all_channels=sample_size[0]*sample_pixels_per_channel
logger.info('set a PyTorch dataset of length %d, input size (assuming it is constant): (%d,%d,%d)'%(
        len(unnested_images_df),sample_size[0],sample_size[1],sample_size[2]))
2019-07-11 21:56:27 <module> (INFO): set a PyTorch dataset of length 1000, input size (assuming it is constant): (3,299,299)

Debugging: verifying dataset by plotting

In [8]:
#sample_indices_to_plot=range(20) # for dataset plotting verification
random.seed(0)
sample_indices_to_plot=random.sample(range(len(unnested_images_df)),20)
images_per_row=5
figure_size=(10,10) # (width,height) in inches
# end of inputs ---------------------------------------------------------------
if debugging:
    DatingAI.plot_unnested_images_dataset(sample_indices_to_plot,dataset,
                                 figure_size,images_per_row,
                                 image_format='PIL->torch',normalize=True)
    plt.suptitle('plotting from pytorch dataset, 1st time')
    
    if random_transforms!='none':
        DatingAI.plot_unnested_images_dataset(sample_indices_to_plot,dataset,
                                     figure_size,images_per_row,
                                     image_format='PIL->torch',normalize=True)
        plt.suptitle('plotting from pytorch dataset, 2st time (to see random transforms)')
    
    DatingAI.plot_unnested_images_df(sample_indices_to_plot,unnested_images_df,
                            images_folder_path,figure_size,images_per_row)
    plt.suptitle('plotting from raw data - unnested_images_df')

Train-val splitting, analyzing target distributions

In [9]:
target_is_continuous=False # target is not continuous, profile scores are discrete
normalization='over total' # heights=counts/sum(discrete_hist)
opacity=0.6
# end of inputs ---------------------------------------------------------------

# splitting
dataset_length=len(unnested_images_df)
dataset_indices=list(range(dataset_length))
split_index=int((1-validation_ratio)*dataset_length)
if shuffle_dataset_indices_for_split:
    np.random.seed(dataset_shuffle_random_seed)
    np.random.shuffle(dataset_indices)
train_indices=dataset_indices[:split_index]
val_indices=dataset_indices[split_index:]
dataset_indices={'train':train_indices,'val':val_indices}
logger.info('dataset indices split to training and validation, with validation_ratio=%.1f, lengths: (train,val)=(%d,%d)'%(
        validation_ratio,len(train_indices),len(val_indices)))

# plotting target distibutions
plt.figure()
for phase in ['train','val']:
    targets_list=unnested_images_df.iloc[dataset_indices[phase]]['score'].values
    DatingAI.easy_hist(targets_list,distribution_is_continuous=target_is_continuous,
              normalization=normalization,label=phase,opacity=opacity)
plt.title('training and validation target distributions')
plt.xlabel('target values')
plt.legend(loc='best');
2019-07-11 21:56:28 <module> (INFO): dataset indices split to training and validation, with validation_ratio=0.5, lengths: (train,val)=(500,500)

This figure presents a few important things:

  • My scoring: integers from -5 to +5, without 0 for a practical reason - my Personal Cupid Scraper automatically likes/dislikes a profile if the score given by the user is positive/negative, to advance to the next profile that the site suggests. Zero stops the script (intentionally), it is unwanted since it is undetermined. I chose discrete scoring since I wanted to leave the option to perform classification.
  • The training and validation distributions are similar enough, which is important.
  • The training and validation distributions are clearly not Gaussian.

Building train,val datsets and dataloaders

In [10]:
# setting batch size
if isinstance(batch_size_int_or_ratio_float,int):
    batch_size=batch_size_int_or_ratio_float
elif isinstance(batch_size_int_or_ratio_float,float):
    batch_size=round(batch_size_int_or_ratio_float*dataset_length)
else:
    raise RuntimeError('unsupported batch_size input!')
if batch_size<1:
    batch_size=1
    logger.warning('batch_size=round(batch_size_over_dataset_length*dataset_length)<1 so batch_size=1 was set')
if batch_size==1:
    user_batch_size=input('got batch_size=1, may cause errors, enter a new batch size equal or larger than 1, or smaller than 1 to abort: ')
    if user_batch_size<1:
        raise RuntimeError('aborted by user batch size decision')
    else:
        batch_size=round(user_batch_size)

# building datasets
if random_transforms=='train': # means applying random transforms only on train, so separate datasets must be created
    if load_all_images_to_RAM and user_decision_RAM=='y':
        train_dataset=DatingAI.unnested_images_dataset(
                unnested_images_df=unnested_images_df.iloc[train_indices],
                images_dict_for_RAM_loading=images_dict_in_RAM,
                transform_func=transform_func)
        val_dataset=DatingAI.unnested_images_dataset(
                unnested_images_df=unnested_images_df.iloc[val_indices],
                images_dict_for_RAM_loading=images_dict_in_RAM,
                transform_func=transform_func)
    else:
        train_dataset=DatingAI.unnested_images_dataset(
            unnested_images_df=unnested_images_df.iloc[train_indices],
            images_folder_path=images_folder_path,
            transform_func=transform_func_with_random)
        val_dataset=DatingAI.unnested_images_dataset(
            unnested_images_df=unnested_images_df.iloc[val_indices],
            images_folder_path=images_folder_path,
            transform_func=transform_func_no_random)
else:
    dataset_to_split=dataset
    # splitting the dataset to train and val
    train_dataset=torch.utils.data.Subset(dataset_to_split,train_indices)
    val_dataset=torch.utils.data.Subset(dataset_to_split,val_indices)

# building the train and val dataloaders
train_dataloader=torch.utils.data.DataLoader(train_dataset,batch_size=batch_size,
                        num_workers=data_workers,shuffle=dataloader_shuffle)
val_dataloader=torch.utils.data.DataLoader(val_dataset,batch_size=batch_size,
                        num_workers=data_workers,shuffle=dataloader_shuffle)

# structuring
datasets={'train':train_dataset,'val':val_dataset}
dataset_samples_number={'train':len(train_dataset),'val':len(val_dataset)}

dataloaders={'train':train_dataloader,'val':val_dataloader}
dataloader_batches_number={'train':len(train_dataloader),'val':len(val_dataloader)}

logger.info('dataset split to training and validation datasets and dataloaders with validation_ratio=%.1f, lengths: (train,val)=(%d,%d)'%(
        validation_ratio,dataset_samples_number['train'],dataset_samples_number['val']))
2019-07-11 21:56:28 <module> (INFO): dataset split to training and validation datasets and dataloaders with validation_ratio=0.5, lengths: (train,val)=(500,500)

Debugging: verifying dataloaders

In [11]:
images_per_row=4
normalize=True # normalizes all pixels in each channel to be in [0,1]. needed for plotting, after the strange normalization that torchvision models require
figure_size=(8,6) # (width,height) in inches
# end of inputs ---------------------------------------------------------------

if debugging:
    if __name__=='__main__' or data_workers==0:
        for phase in ['train','val']:
            batch=next(iter(dataloaders[phase]))
            DatingAI.plot_unnested_images_batch(batch,figure_size=figure_size,
                            images_per_row=images_per_row,normalize=normalize)
            plt.suptitle('plotting a batch from the %s dataloader'%phase)
    else:
        logger.warning('cannot use multiprocessing (data_workers>0 in dataloaders) in Windows when executed not as main')

Setting the model

In [12]:
if net_architecture=='my simple CNN':
    class my_CNN(nn.Module):
        def __init__(self):
            super(my_CNN, self).__init__()
            self.conv1 = nn.Conv2d(3,6,8,stride=4)
            self.pool = nn.MaxPool2d(2,2)
            self.conv2 = nn.Conv2d(6,16,8,stride=4)
            self.fc1 = nn.Linear(400,120)
            self.fc2 = nn.Linear(120,84)
            self.fc3 = nn.Linear(84,1)
        def forward(self,x):
            x=F.relu(self.conv1(x))
            x=self.pool(x)
            x=F.relu(self.conv2(x))
            x=self.pool(x)            
            x=x.view(-1,np.array(x.shape[1:]).prod()) # don't use x.view(batch_size,-1), which fails for batches smaller than batch_size (at the end of the dataloader)
            x=F.relu(self.fc1(x))
            x=F.relu(self.fc2(x))
            x=self.fc3(x)
            return x
    model=my_CNN()
    parameters_to_optimize=model.parameters()
elif net_architecture=='resnet18':
    model=torchvision.models.resnet18(pretrained=True)
    if freeze_pretrained_net_weights:
        for param in model.parameters():
            param.requires_grad=False
        parameters_to_optimize=model.fc.parameters()
    else:
        parameters_to_optimize=model.parameters()
    model.fc=nn.Linear(model.fc.in_features,1) # Parameters of newly constructed modules have requires_grad=True by default
elif net_architecture=='inception v3':
    model=torchvision.models.inception_v3(pretrained=True)
    
    if freeze_pretrained_net_weights:
        for param in model.parameters():
            param.requires_grad=False
    # Parameters of newly constructed modules have requires_grad=True by default:
    model.AuxLogits.fc=nn.Linear(768,1)
    model.fc=nn.Linear(2048,1)
    
    if freeze_pretrained_net_weights:
        parameters_to_optimize=[]
        for name,parameter in model.named_parameters():
            if parameter.requires_grad:
                parameters_to_optimize.append(parameter)
    else:
        parameters_to_optimize=model.parameters()
else:
    raise RuntimeError('untreated net_architecture!')

model=model.to(device)
total_weights_num=sum(p.numel() for p in model.parameters())
trainable_weights_num=sum(p.numel() for p in model.parameters() if p.requires_grad)

logger.info("set '%s' net on %s, trainable/total weigths: %.1e/%.1e"%(
    net_architecture,device,trainable_weights_num,total_weights_num))

if loss_name=='MSE':
    loss_fn=nn.MSELoss(reduction='mean').to(device)
else:
    raise RuntimeError('untreated loss_name input')
if optimizer_name=='SGD':
    optimizer=torch.optim.SGD(parameters_to_optimize,lr=learning_rate,momentum=SGD_momentum)
elif optimizer_name=='Adam':
    optimizer=torch.optim.Adam(parameters_to_optimize,lr=learning_rate,betas=Adam_betas)
else:
    raise RuntimeError('untreated optimizer_name input')

scheduler=torch.optim.lr_scheduler.StepLR(optimizer,
    step_size=lr_scheduler_step_size,gamma=lr_scheduler_decay_factor)
2019-07-11 21:56:36 <module> (INFO): set 'inception v3' net on cuda:0, trainable/total weigths: 2.4e+07/2.4e+07

Debugging: verifying model architecture

In [13]:
# comment last lines from Net.forward() to check outputs of earlier lines

if debugging:
    if __name__=='__main__' or data_workers==0:
        batch=next(iter(dataloaders['train']))
        labels=batch['profile score'].to(device).unsqueeze(1).float()
        
        images=batch['image']
        images=images.to(device)
        print('batch images.shape:',images.shape)
        
        model.eval()
        outputs=model(images)
        print('outputs.shape:',outputs.shape)
        with torch.set_grad_enabled(False):
            MSE=((outputs.flatten()-labels)**2).sum()/len(labels)
            sqrt_MSE=MSE**0.5
            sqrt_MSE=sqrt_MSE.item()
        print('batch sqrt(MSE):',sqrt_MSE)
    else:
        logger.warning('cannot use multiprocessing (data_workers>0 in dataloaders) in Windows when executed not as main')
batch images.shape: torch.Size([8, 3, 299, 299])
outputs.shape: torch.Size([8, 1])
batch sqrt(MSE): 7.41342306137085

Training the model, analysis

Training

In [14]:
def model_evaluation(model,dataloader,loss_fn):
    model.eval() # set model to evaluate mode
    
    epoch_loss=0.0 # must be a float
    epoch_samples_number=0
    label_arrays_list=[]
    output_arrays_list=[]
    
    for i_batch,batch in enumerate(dataloader):        
        images=batch['image'].to(device)                
        labels=batch['profile score'].to(device).unsqueeze(1).float()
        
        # forward
        with torch.set_grad_enabled(False): # if phase=='train' it tracks tensor history for grad calc
            outputs=model(images)
            loss=loss_fn(outputs,labels)
        
        # accumulating
        samples_number=len(labels)
        epoch_samples_number+=samples_number        
        current_loss=loss.item()*samples_number # the loss is averaged across samples in each minibatch, so it is multiplied to return to a total
        epoch_loss+=current_loss
        
        label_arrays_list.append(labels.flatten().cpu().numpy())
        output_arrays_list.append(outputs.flatten().cpu().numpy())
        
    # post-processing
    epoch_loss_per_sample=epoch_loss/dataset_samples_number[phase]
    labels_array=np.concatenate(label_arrays_list)
    outputs_array=np.concatenate(output_arrays_list)
    return labels_array,outputs_array,epoch_loss_per_sample

# if force_train_evaluation_after_each_epoch:
#     train_evaluation_ui=input('force_train_evaluation_after_each_epoch=True was set, it is inefficient (but useful for training analysis), approve? y/[n] ')
#     if train_evaluation_ui!='y':
#         raise RuntimeError('user did not approve force_train_evaluation_after_each_epoch=True, aborting!')

if plot_realtime_stats_on_logging or plot_realtime_stats_after_each_epoch:
    logger.warning('plotting from inside the net loop is not working, should be debugged...')

if train_model_else_load_weights and (__name__=='__main__' or data_workers==0):
    stats_dict={'train':{'running metrics':{},'epoch total running metrics':{},
                         'evaluation metrics':{}},
                     'val':{'evaluation metrics':{}}}
    
    if torch.cuda.is_available():
        logger.info('torch is using %s (%s)'%(device,torch.cuda.get_device_name(device=0)))
    else:
        logger.info('torch is using %s'%(device))
    
    # model pre-training evaluation
    logger.info('started model pre-training evaluation')
    for phase in ['train','val']:
        dataloader=dataloaders[phase]
        labels_array,outputs_array,epoch_loss_per_sample=\
                            model_evaluation(model,dataloaders[phase],loss_fn)
        errors_array=labels_array-outputs_array
        epoch_MSE=(errors_array**2).mean()
        logger.info('(pre-training, %s) loss per sample: %.3e, sqrt(MSE): sqrt(%.3e)=%.3e'%(
                        phase,epoch_loss_per_sample,epoch_MSE,epoch_MSE**0.5))
        stats_dict[phase]['evaluation metrics'].update({0:
                            {'loss per sample':epoch_loss_per_sample,
                             'MSE':epoch_MSE}})

    total_batches=epochs*(dataloader_batches_number['train']+dataloader_batches_number['val'])
    period_already_logged=0
    logger.info('started model training')
    print('-'*10)
    tic=time()
    
    # model training
    for epoch in range(epochs):
        for phase in ['train','val']:
            if phase == 'train':
                scheduler.step()
                model.train() # set model to training mode
            else:
                model.eval() # set model to evaluate mode
            
            epoch_loss=0.0 # must be a float
            epoch_squared_error=0.0
            samples_processed_since_last_log=0
            loss_since_last_log=0.0 # must be a float
            squared_error_since_last_log=0.0
            
            for i_batch,batch in enumerate(dataloaders[phase]):
                images=batch['image'].to(device)                
                labels=batch['profile score'].to(device).unsqueeze(1).float()
                
                optimizer.zero_grad() # zero the parameter gradients
                
                # forward
                with torch.set_grad_enabled(phase=='train'): # if phase=='train' it tracks tensor history for grad calc
                    if net_architecture=='inception v3' and phase=='train':
                        outputs,aux_outputs=model(images)
                        loss1=loss_fn(outputs,labels)
                        loss2=loss_fn(aux_outputs,labels)
                        loss=loss1+0.4*loss2 # in train mode it has an auxiliary output (to deal with gradient decay); see https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
#                         loss=loss1 # in train mode it has an auxiliary output (to deal with gradient decay); see https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
                    else:
                        outputs=model(images)
                        loss=loss_fn(outputs,labels)
                    if torch.isnan(loss):
                        raise RuntimeError('reached NaN loss - aborting training!')
                    # backward + optimize if training
                    if phase=='train':
                        loss.backward()
                        optimizer.step()
                
                # accumulating stats
                samples_number=len(labels)
                samples_processed_since_last_log+=samples_number
                
                current_loss=loss.item()*samples_number # the loss is averaged across samples in each minibatch, so it is multiplied to return to a total
                epoch_loss+=current_loss
                loss_since_last_log+=current_loss
                
                with torch.set_grad_enabled(False):
                    batch_squared_error=((outputs-labels)**2).sum().item()
                epoch_squared_error+=batch_squared_error
                squared_error_since_last_log+=batch_squared_error
                
                # logging running stats
                if phase=='train' and period_in_seconds_to_log_loss>0:
                    passed_seconds=time()-tic
                    period=passed_seconds//period_in_seconds_to_log_loss
                    if period>period_already_logged:
                        period_already_logged=period
                        loss_since_last_log_per_sample=loss_since_last_log/samples_processed_since_last_log
                        MSE_since_last_log=squared_error_since_last_log/samples_processed_since_last_log
                
                        completed_batches=epoch*(dataloader_batches_number['train']+dataloader_batches_number['val'])+(i_batch+1)
                        completed_batches_progress=completed_batches/total_batches
                        
                        
                        logger.info('(epoch %d/%d, batch %d/%d, %s, running) loss per sample : %.3e, sqrt(MSE): sqrt(%.3e)=%.3e'%(
                                    epoch+1,epochs,i_batch+1,dataloader_batches_number[phase],phase,
                                    loss_since_last_log_per_sample,
                                    MSE_since_last_log,MSE_since_last_log**0.5))
                        
                        partial_epoch=epoch+completed_batches_progress
                        stats_dict[phase]['running metrics'].update({partial_epoch:
                            {'batch':i_batch+1,'loss per sample':loss_since_last_log_per_sample,
                             'MSE':MSE_since_last_log}})
        
                        loss_since_last_log=0.0 # must be a float
                        squared_error_since_last_log=0.0
                        samples_processed_since_last_log=0
            
            # accumulating epoch stats
            epoch_loss_per_sample=epoch_loss/dataset_samples_number[phase]
            epoch_MSE=epoch_squared_error/dataset_samples_number[phase]
            
            if phase=='train': # saving running stats
                stats_dict[phase]['epoch total running metrics'].update({epoch+1:
                            {'loss per sample':epoch_loss_per_sample,
                             'MSE':epoch_MSE}})
                if force_train_evaluation_after_each_epoch: # train dataloader evaluation
                    labels_array,outputs_array,epoch_loss_per_sample=\
                        model_evaluation(model,dataloaders[phase],loss_fn)
                    errors_array=labels_array-outputs_array
                    epoch_MSE=(errors_array**2).mean()
                    stats_dict[phase]['evaluation metrics'].update({epoch+1:
                            {'loss per sample':epoch_loss_per_sample,
                             'MSE':epoch_MSE}})
            else: # val dataloader evaluation
                stats_dict[phase]['evaluation metrics'].update({epoch+1:
                            {'loss per sample':epoch_loss_per_sample,
                             'MSE':epoch_MSE}})
            
            if phase=='val': # updating best model results
                if best_model_criterion=='min val epoch MSE':
                    best_criterion_current_value=epoch_MSE
                    if epoch==0:
                        best_criterion_best_value=best_criterion_current_value
                        best_model_wts=copy.deepcopy(model.state_dict())
                        best_epoch=epoch
                    else:
                        if best_criterion_current_value<best_criterion_best_value:
                            best_criterion_best_value=best_criterion_current_value
                            best_model_wts=copy.deepcopy(model.state_dict())
                            best_epoch=epoch
            
            # logging evaluation stats
            if phase=='val' or force_train_evaluation_after_each_epoch:
                completed_epochs_progress=(epoch+1)/epochs
                passed_seconds=time()-tic
                expected_seconds=passed_seconds/completed_epochs_progress*(1-completed_epochs_progress)
                expected_remainder_time=DatingAI.remainder_time(expected_seconds)
                
                logger.info('(epoch %d/%d, %s, evaluation) epoch loss per sample: %.3e, epoch sqrt(MSE): sqrt(%.3e)=%.3e\n\tProgress: %.2f%%, ETA: %dh:%dm:%.0fs'%(
                                    epoch+1,epochs,phase,
                                    epoch_loss_per_sample,
                                    epoch_MSE,epoch_MSE**0.5,
                                    100*completed_epochs_progress,
                                    expected_remainder_time.hours,
                                    expected_remainder_time.remainder_minutes,
                                    expected_remainder_time.remainder_seconds))
                print('-'*10)
    toc=time()
    elapsed_sec=toc-tic

    logger.info('finished training %d epochs in %dm:%.1fs'%(
            epochs,elapsed_sec//60,elapsed_sec%60))
    if return_to_best_weights_in_the_end:
        model.load_state_dict(best_model_wts)
        logger.info("loaded weights of best model according to '%s' criterion: best value %.3f achieved in epoch %d"%(
                best_model_criterion,best_criterion_best_value,best_epoch+1))

else: # train_model_else_load_weights==False
    model_name_ui=input('model weights file name to load: ')
    model_weights_file_path=os.path.join(models_folder_path,model_name_ui)
    if not os.path.isfile(model_weights_file_path):
        raise RuntimeError('%model_weights_path does not exist!')
    model_weights=torch.load(model_weights_file_path)
    model.load_state_dict(model_weights)
    logger.info('model weights from %s were loaded'%model_weights_file_path)
2019-07-11 21:56:37 <module> (INFO): torch is using cuda:0 (GeForce GTX 960M)
2019-07-11 21:56:37 <module> (INFO): started model pre-training evaluation
2019-07-11 21:56:51 <module> (INFO): (pre-training, train) loss per sample: 8.017e+00, sqrt(MSE): sqrt(8.017e+00)=2.831e+00
2019-07-11 21:57:03 <module> (INFO): (pre-training, val) loss per sample: 8.847e+00, sqrt(MSE): sqrt(8.847e+00)=2.974e+00
2019-07-11 21:57:03 <module> (INFO): started model training
----------
2019-07-11 21:57:33 <module> (INFO): (epoch 1/15, batch 50/63, train, running) loss per sample : 1.018e+01, sqrt(MSE): sqrt(7.231e+00)=2.689e+00
2019-07-11 21:57:56 <module> (INFO): (epoch 1/15, train, evaluation) epoch loss per sample: 6.473e+00, epoch sqrt(MSE): sqrt(6.473e+00)=2.544e+00
	Progress: 6.67%, ETA: 0h:12m:23s
----------
2019-07-11 21:58:10 <module> (INFO): (epoch 1/15, val, evaluation) epoch loss per sample: 7.404e+00, epoch sqrt(MSE): sqrt(7.404e+00)=2.721e+00
	Progress: 6.67%, ETA: 0h:15m:42s
----------
2019-07-11 21:58:11 <module> (INFO): (epoch 2/15, batch 1/63, train, running) loss per sample : 6.177e+00, sqrt(MSE): sqrt(4.504e+00)=2.122e+00
2019-07-11 21:58:33 <module> (INFO): (epoch 2/15, batch 37/63, train, running) loss per sample : 9.606e+00, sqrt(MSE): sqrt(6.965e+00)=2.639e+00
2019-07-11 21:59:04 <module> (INFO): (epoch 2/15, train, evaluation) epoch loss per sample: 5.999e+00, epoch sqrt(MSE): sqrt(5.999e+00)=2.449e+00
	Progress: 13.33%, ETA: 0h:13m:9s
----------
2019-07-11 21:59:18 <module> (INFO): (epoch 2/15, val, evaluation) epoch loss per sample: 7.327e+00, epoch sqrt(MSE): sqrt(7.327e+00)=2.707e+00
	Progress: 13.33%, ETA: 0h:14m:41s
----------
2019-07-11 21:59:19 <module> (INFO): (epoch 3/15, batch 1/63, train, running) loss per sample : 1.467e+01, sqrt(MSE): sqrt(1.044e+01)=3.231e+00
2019-07-11 21:59:33 <module> (INFO): (epoch 3/15, batch 23/63, train, running) loss per sample : 7.761e+00, sqrt(MSE): sqrt(5.733e+00)=2.394e+00
2019-07-11 22:00:12 <module> (INFO): (epoch 3/15, train, evaluation) epoch loss per sample: 5.399e+00, epoch sqrt(MSE): sqrt(5.399e+00)=2.324e+00
	Progress: 20.00%, ETA: 0h:12m:38s
----------
2019-07-11 22:00:26 <module> (INFO): (epoch 3/15, val, evaluation) epoch loss per sample: 7.249e+00, epoch sqrt(MSE): sqrt(7.249e+00)=2.692e+00
	Progress: 20.00%, ETA: 0h:13m:35s
----------
2019-07-11 22:00:27 <module> (INFO): (epoch 4/15, batch 1/63, train, running) loss per sample : 6.963e+00, sqrt(MSE): sqrt(5.083e+00)=2.255e+00
2019-07-11 22:00:33 <module> (INFO): (epoch 4/15, batch 10/63, train, running) loss per sample : 6.897e+00, sqrt(MSE): sqrt(5.107e+00)=2.260e+00
2019-07-11 22:01:03 <module> (INFO): (epoch 4/15, batch 58/63, train, running) loss per sample : 7.929e+00, sqrt(MSE): sqrt(5.890e+00)=2.427e+00
2019-07-11 22:01:20 <module> (INFO): (epoch 4/15, train, evaluation) epoch loss per sample: 4.712e+00, epoch sqrt(MSE): sqrt(4.712e+00)=2.171e+00
	Progress: 26.67%, ETA: 0h:11m:49s
----------
2019-07-11 22:01:34 <module> (INFO): (epoch 4/15, val, evaluation) epoch loss per sample: 7.299e+00, epoch sqrt(MSE): sqrt(7.299e+00)=2.702e+00
	Progress: 26.67%, ETA: 0h:12m:28s
----------
2019-07-11 22:01:35 <module> (INFO): (epoch 5/15, batch 1/63, train, running) loss per sample : 9.791e+00, sqrt(MSE): sqrt(7.176e+00)=2.679e+00
2019-07-11 22:02:03 <module> (INFO): (epoch 5/15, batch 45/63, train, running) loss per sample : 6.609e+00, sqrt(MSE): sqrt(4.872e+00)=2.207e+00
2019-07-11 22:02:29 <module> (INFO): (epoch 5/15, train, evaluation) epoch loss per sample: 3.877e+00, epoch sqrt(MSE): sqrt(3.877e+00)=1.969e+00
	Progress: 33.33%, ETA: 0h:10m:52s
----------
2019-07-11 22:02:43 <module> (INFO): (epoch 5/15, val, evaluation) epoch loss per sample: 7.359e+00, epoch sqrt(MSE): sqrt(7.359e+00)=2.713e+00
	Progress: 33.33%, ETA: 0h:11m:20s
----------
2019-07-11 22:02:43 <module> (INFO): (epoch 6/15, batch 1/63, train, running) loss per sample : 8.317e+00, sqrt(MSE): sqrt(6.169e+00)=2.484e+00
2019-07-11 22:03:03 <module> (INFO): (epoch 6/15, batch 32/63, train, running) loss per sample : 6.314e+00, sqrt(MSE): sqrt(4.580e+00)=2.140e+00
2019-07-11 22:03:37 <module> (INFO): (epoch 6/15, train, evaluation) epoch loss per sample: 3.381e+00, epoch sqrt(MSE): sqrt(3.381e+00)=1.839e+00
	Progress: 40.00%, ETA: 0h:9m:51s
----------
2019-07-11 22:03:51 <module> (INFO): (epoch 6/15, val, evaluation) epoch loss per sample: 7.520e+00, epoch sqrt(MSE): sqrt(7.520e+00)=2.742e+00
	Progress: 40.00%, ETA: 0h:10m:12s
----------
2019-07-11 22:03:51 <module> (INFO): (epoch 7/15, batch 1/63, train, running) loss per sample : 5.910e+00, sqrt(MSE): sqrt(4.240e+00)=2.059e+00
2019-07-11 22:04:03 <module> (INFO): (epoch 7/15, batch 20/63, train, running) loss per sample : 6.036e+00, sqrt(MSE): sqrt(4.347e+00)=2.085e+00
2019-07-11 22:04:45 <module> (INFO): (epoch 7/15, train, evaluation) epoch loss per sample: 2.399e+00, epoch sqrt(MSE): sqrt(2.399e+00)=1.549e+00
	Progress: 46.67%, ETA: 0h:8m:48s
----------
2019-07-11 22:04:59 <module> (INFO): (epoch 7/15, val, evaluation) epoch loss per sample: 7.466e+00, epoch sqrt(MSE): sqrt(7.466e+00)=2.732e+00
	Progress: 46.67%, ETA: 0h:9m:4s
----------
2019-07-11 22:04:59 <module> (INFO): (epoch 8/15, batch 1/63, train, running) loss per sample : 3.340e+00, sqrt(MSE): sqrt(2.587e+00)=1.608e+00
2019-07-11 22:05:03 <module> (INFO): (epoch 8/15, batch 7/63, train, running) loss per sample : 4.468e+00, sqrt(MSE): sqrt(3.225e+00)=1.796e+00
2019-07-11 22:05:33 <module> (INFO): (epoch 8/15, batch 55/63, train, running) loss per sample : 5.203e+00, sqrt(MSE): sqrt(3.639e+00)=1.908e+00
2019-07-11 22:05:53 <module> (INFO): (epoch 8/15, train, evaluation) epoch loss per sample: 2.021e+00, epoch sqrt(MSE): sqrt(2.021e+00)=1.422e+00
	Progress: 53.33%, ETA: 0h:7m:44s
----------
2019-07-11 22:06:06 <module> (INFO): (epoch 8/15, val, evaluation) epoch loss per sample: 7.780e+00, epoch sqrt(MSE): sqrt(7.780e+00)=2.789e+00
	Progress: 53.33%, ETA: 0h:7m:56s
----------
2019-07-11 22:06:07 <module> (INFO): (epoch 9/15, batch 1/63, train, running) loss per sample : 2.753e+00, sqrt(MSE): sqrt(1.623e+00)=1.274e+00
2019-07-11 22:06:33 <module> (INFO): (epoch 9/15, batch 42/63, train, running) loss per sample : 4.454e+00, sqrt(MSE): sqrt(3.090e+00)=1.758e+00
2019-07-11 22:07:00 <module> (INFO): (epoch 9/15, train, evaluation) epoch loss per sample: 1.811e+00, epoch sqrt(MSE): sqrt(1.811e+00)=1.346e+00
	Progress: 60.00%, ETA: 0h:6m:38s
----------
2019-07-11 22:07:14 <module> (INFO): (epoch 9/15, val, evaluation) epoch loss per sample: 7.974e+00, epoch sqrt(MSE): sqrt(7.974e+00)=2.824e+00
	Progress: 60.00%, ETA: 0h:6m:48s
----------
2019-07-11 22:07:15 <module> (INFO): (epoch 10/15, batch 1/63, train, running) loss per sample : 2.238e+00, sqrt(MSE): sqrt(1.193e+00)=1.092e+00
2019-07-11 22:07:33 <module> (INFO): (epoch 10/15, batch 30/63, train, running) loss per sample : 3.447e+00, sqrt(MSE): sqrt(2.466e+00)=1.570e+00
2019-07-11 22:08:08 <module> (INFO): (epoch 10/15, train, evaluation) epoch loss per sample: 1.320e+00, epoch sqrt(MSE): sqrt(1.320e+00)=1.149e+00
	Progress: 66.67%, ETA: 0h:5m:33s
----------
2019-07-11 22:08:22 <module> (INFO): (epoch 10/15, val, evaluation) epoch loss per sample: 7.982e+00, epoch sqrt(MSE): sqrt(7.982e+00)=2.825e+00
	Progress: 66.67%, ETA: 0h:5m:40s
----------
2019-07-11 22:08:23 <module> (INFO): (epoch 11/15, batch 1/63, train, running) loss per sample : 2.176e+00, sqrt(MSE): sqrt(1.507e+00)=1.227e+00
2019-07-11 22:08:33 <module> (INFO): (epoch 11/15, batch 17/63, train, running) loss per sample : 3.751e+00, sqrt(MSE): sqrt(2.600e+00)=1.613e+00
2019-07-11 22:09:16 <module> (INFO): (epoch 11/15, train, evaluation) epoch loss per sample: 1.162e+00, epoch sqrt(MSE): sqrt(1.162e+00)=1.078e+00
	Progress: 73.33%, ETA: 0h:4m:27s
----------
2019-07-11 22:09:30 <module> (INFO): (epoch 11/15, val, evaluation) epoch loss per sample: 8.000e+00, epoch sqrt(MSE): sqrt(8.000e+00)=2.828e+00
	Progress: 73.33%, ETA: 0h:4m:32s
----------
2019-07-11 22:09:31 <module> (INFO): (epoch 12/15, batch 1/63, train, running) loss per sample : 2.442e+00, sqrt(MSE): sqrt(1.631e+00)=1.277e+00
2019-07-11 22:09:33 <module> (INFO): (epoch 12/15, batch 5/63, train, running) loss per sample : 1.758e+00, sqrt(MSE): sqrt(1.215e+00)=1.102e+00
2019-07-11 22:10:03 <module> (INFO): (epoch 12/15, batch 53/63, train, running) loss per sample : 2.994e+00, sqrt(MSE): sqrt(2.043e+00)=1.429e+00
2019-07-11 22:10:24 <module> (INFO): (epoch 12/15, train, evaluation) epoch loss per sample: 1.038e+00, epoch sqrt(MSE): sqrt(1.038e+00)=1.019e+00
	Progress: 80.00%, ETA: 0h:3m:20s
----------
2019-07-11 22:10:38 <module> (INFO): (epoch 12/15, val, evaluation) epoch loss per sample: 8.328e+00, epoch sqrt(MSE): sqrt(8.328e+00)=2.886e+00
	Progress: 80.00%, ETA: 0h:3m:24s
----------
2019-07-11 22:10:39 <module> (INFO): (epoch 13/15, batch 1/63, train, running) loss per sample : 3.947e+00, sqrt(MSE): sqrt(2.650e+00)=1.628e+00
2019-07-11 22:11:03 <module> (INFO): (epoch 13/15, batch 40/63, train, running) loss per sample : 3.203e+00, sqrt(MSE): sqrt(2.219e+00)=1.490e+00
2019-07-11 22:11:32 <module> (INFO): (epoch 13/15, train, evaluation) epoch loss per sample: 8.576e-01, epoch sqrt(MSE): sqrt(8.576e-01)=9.261e-01
	Progress: 86.67%, ETA: 0h:2m:14s
----------
2019-07-11 22:11:46 <module> (INFO): (epoch 13/15, val, evaluation) epoch loss per sample: 8.145e+00, epoch sqrt(MSE): sqrt(8.145e+00)=2.854e+00
	Progress: 86.67%, ETA: 0h:2m:16s
----------
2019-07-11 22:11:46 <module> (INFO): (epoch 14/15, batch 1/63, train, running) loss per sample : 1.580e+00, sqrt(MSE): sqrt(1.173e+00)=1.083e+00
2019-07-11 22:12:03 <module> (INFO): (epoch 14/15, batch 27/63, train, running) loss per sample : 2.338e+00, sqrt(MSE): sqrt(1.536e+00)=1.239e+00
2019-07-11 22:12:40 <module> (INFO): (epoch 14/15, train, evaluation) epoch loss per sample: 8.682e-01, epoch sqrt(MSE): sqrt(8.682e-01)=9.317e-01
	Progress: 93.33%, ETA: 0h:1m:7s
----------
2019-07-11 22:12:54 <module> (INFO): (epoch 14/15, val, evaluation) epoch loss per sample: 8.139e+00, epoch sqrt(MSE): sqrt(8.139e+00)=2.853e+00
	Progress: 93.33%, ETA: 0h:1m:8s
----------
2019-07-11 22:12:54 <module> (INFO): (epoch 15/15, batch 1/63, train, running) loss per sample : 1.035e+00, sqrt(MSE): sqrt(6.572e-01)=8.107e-01
2019-07-11 22:13:03 <module> (INFO): (epoch 15/15, batch 15/63, train, running) loss per sample : 2.682e+00, sqrt(MSE): sqrt(1.904e+00)=1.380e+00
2019-07-11 22:13:33 <module> (INFO): (epoch 15/15, batch 63/63, train, running) loss per sample : 2.640e+00, sqrt(MSE): sqrt(1.809e+00)=1.345e+00
2019-07-11 22:13:48 <module> (INFO): (epoch 15/15, train, evaluation) epoch loss per sample: 8.063e-01, epoch sqrt(MSE): sqrt(8.063e-01)=8.979e-01
	Progress: 100.00%, ETA: 0h:0m:0s
----------
2019-07-11 22:14:01 <module> (INFO): (epoch 15/15, val, evaluation) epoch loss per sample: 8.122e+00, epoch sqrt(MSE): sqrt(8.122e+00)=2.850e+00
	Progress: 100.00%, ETA: 0h:0m:0s
----------
2019-07-11 22:14:01 <module> (INFO): finished training 15 epochs in 16m:58.8s
2019-07-11 22:14:02 <module> (INFO): loaded weights of best model according to 'min val epoch MSE' criterion: best value 7.249 achieved in epoch 3

Analyzing training statistics

My definitions

  • Running stats - measured each time running stats are logged (set by periond_in_seconds_to_log_loss) for the training dataset samples that were trained since the last running stats logging!
  • Epoch total running stats - averaging all running stats on all the training dataset. Notice that these stats are measured along training, while the weights are being trained!
  • Evaluation stats - measured in each epoch after training completes (model in evaluation mode)! For the training dataset, evaluation stats are optional and can be avoided to save time (by setting force_train_evaluation_after_each_epoch=False).

What to expect

I started this project for my interest, challenge, experience. "Great" results are not any of the goals, and not expected, since:

  • I don't care to play around with hyper-parameters.
  • I demonstrate results on a quite small dataset.
  • I trained only a short time (a few epochs).
  • There are many other features in a profile except for images that are not treated here: age, height, text...
  • Images in each profile are not scored independently as in this notebook, but all together, e.g. few unclear images that don't help to figure out how the person in the profile looks are much less important to the final impression than one clear image. This is why I use self attention to consider all the images in each profile together, in Image sequence self attentive score regression.
In [15]:
# plot_running_stats=True
plot_running_stats=False
plot_epoch_total_running_stats=True
# plot_epoch_total_running_stats=False
plot_loss_in_log_scale=False
# plot_loss_in_log_scale=True
figure_size=(10,4) # (width,height) in inches
# end of inputs ---------------------------------------------------------------

logger.info("remember that even if loss_name='MSE', the loss may include regularization or auxiliary terms (as in inception v3) and therefore may not equal MSE!")
if not (plot_realtime_stats_on_logging or plot_realtime_stats_after_each_epoch):
    fig=plt.figure(figsize=figure_size)
    plt.suptitle('training stats')
    loss_subplot=plt.subplot(1,2,1)
    MSE_subplot=plt.subplot(1,2,2)
    DatingAI.training_stats_plot(stats_dict,fig,loss_subplot,MSE_subplot,plot_loss_in_log_scale,
                        plot_running_stats,plot_epoch_total_running_stats)
2019-07-11 22:14:02 <module> (INFO): remember that even if loss_name='MSE', the loss may include regularization or auxiliary terms (as in inception v3) and therefore may not equal MSE!

Analyzing results

  • In general, the training looks OK according to the results - losses decrease quite smoothly over epochs, the running loss is similar and higher than training evaluation loss, and validation loss is higher than both.
  • Analyzing $\sqrt{MSE}$ gives much more intuition than the loss, since it's in units of profile score. Getting $\sqrt{MSE}_{train}\sim1$ means that on average, it misses only by $\sim1$ score units on every prediction made for unseen data (that it was not trained on), e.g. predicting $3$ or $1$ for a profile that was scored $2$.
  • Score distribution goes from -5 to 5 (excluding 0), so a random guess error would be $\sqrt{MSE}_{rand}\sim5$. Therefore, getting $\sqrt{MSE}_{val}\sim2.6\sim0.5\sqrt{MSE}_{rand}$ is nice, and $\sqrt{MSE}_{train}\sim0.5\sim0.1\sqrt{MSE}_{rand}$ is very nice - means that the signal exists.
  • We see that the model strongly over-fits, unsurprisingly - as I wrote, I trained it for only a short time on a small dataset, and didn't care to optimize the model (simplifying, adding dropout, etc).

Post-training model evaluation

Since return_to_best_weights_in_the_end=True and best_model_criterion='min val epoch MSE' were set, the weights of the minimal validation MSE achieved during training were set in the model after training completed. As logged (or shown in the training stats figure), the best criterion value was achieved in epoch 3. The model with those weights is evaluated here:

In [16]:
distribution_is_continuous=True # target is not continuous, profile scores are discrete
normalization='over total' # heights=counts/sum(discrete_hist)
opacity=0.6
bins=50
# end of inputs ---------------------------------------------------------------

logger.info('started model evaluation')
plt.figure()
for phase in ['train','val']:
    dataloader=dataloaders[phase]
    labels_array,outputs_array,epoch_loss_per_sample=\
                        model_evaluation(model,dataloaders[phase],loss_fn)
    errors_array=labels_array-outputs_array
    DatingAI.easy_hist(errors_array,distribution_is_continuous=distribution_is_continuous,
              bins=bins,normalization=normalization,label=phase,opacity=opacity)
    epoch_MSE=(errors_array**2).mean()
    logger.info('(post-training, %s) loss per sample: %.3e, sqrt(MSE): sqrt(%.3e)=%.3e'%(
                    phase,epoch_loss_per_sample,epoch_MSE,epoch_MSE**0.5))

logger.info('completed model evaluation')
plt.title('training and validation error distributions')
plt.xlabel('errors (targets-predictions)')
plt.legend(loc='best');
2019-07-11 22:14:02 <module> (INFO): started model evaluation
2019-07-11 22:14:17 <module> (INFO): (post-training, train) loss per sample: 5.398e+00, sqrt(MSE): sqrt(5.398e+00)=2.323e+00
2019-07-11 22:14:30 <module> (INFO): (post-training, val) loss per sample: 7.249e+00, sqrt(MSE): sqrt(7.249e+00)=2.692e+00
2019-07-11 22:14:30 <module> (INFO): completed model evaluation

Analyzing the error distributions

  • After returning to the best validation MSE during training, we see in the figure that the training and validation error distributions are quite similar - indeed no strong over-fitting.
  • The validation error distribution looks a bit wider than the training disribution, but the main difference appears to be that they are shifted. Interesting.
  • The error distributions are clearly not discrete, which is somewhat surprising, since the model was trained to predict discrete scores, so it was reasonable to expect errors around discrete target scores. In contrast, in my attentive chararcter-level biLSTM for score regression on text (see notebook here), the error distributions are quite discrete.

Offering to save net weights

In [17]:
if offer_mode_saving and train_model_else_load_weights:
    try: os.mkdir(models_folder_path)
    except FileExistsError: pass # if the folder exists already - do nothing
    
    saving_decision=input('save model weights? [y]/n ')
    if saving_decision!='n':
        model_name_ui=input('name model weights file: ')
        model_weights_file_path=os.path.join(models_folder_path,model_name_ui+'.ptweights')
        if os.path.isfile(model_weights_file_path):
            alternative_filename=input('%s already exists, give a different file name to save, the same file name to over-write, or hit enter to abort: '%model_weights_file_path)
            if alternative_filename=='':
                raise RuntimeError('aborted by user')
            else:
                model_weights_file_path=os.path.join(models_folder_path,alternative_filename+'.ptweights')
        torch.save(model.state_dict(),model_weights_file_path)       
        logger.info('%s saved'%model_weights_file_path)
logger.info('script completed')
2019-07-11 22:14:30 <module> (INFO): script completed